340. Longest Substring with At Most K Distinct Characters
Medium

Given a string s and an integer k, return the length of the longest substring of s that contains at most k distinct characters.

 

Example 1:

Input: s = "eceba", k = 2
Output: 3
Explanation: The substring is "ece" with length 3.

Example 2:

Input: s = "aa", k = 1
Output: 2
Explanation: The substring is "aa" with length 2.

 

Constraints:

  • 1 <= s.length <= 5 * 104
  • 0 <= k <= 50
Accepted
202,899
Submissions
441,553
Seen this question in a real interview before?
Companies

Average Rating: 3.92 (73 votes)

Premium

Video Solution


 

Solution Article


Approach 1: Sliding Window + Hashmap.

Intuition

We could take some inspiration from a simpler problem called longest substring with at most two distinct characters.

To solve the problem in one pass let's use here sliding window approach with two set pointers left and right serving as the window boundaries.

The idea is to set both pointers in the position 0 and then move right pointer to the right while the window contains not more than k distinct characters. If at some point we've got k + 1 distinct characters, let's move left pointer to keep not more than k + 1 distinct characters in the window.

compute

Basically that's the algorithm : to move sliding window along the string, to keep not more than k distinct characters in the window, and to update max substring length at each step.

There is just one more question to reply - how to move the left pointer to keep only k distinct characters in the string?

Let's use for this purpose hashmap containing all characters in the sliding window as keys and their rightmost positions as values. At each moment, this hashmap could contain not more than k + 1 elements.

compute

For example, using this hashmap one knows that the rightmost position of character O in "LOVELEE" window is 1 and so one has to move left pointer in the position 1 + 1 = 2 to exclude the character O from the sliding window.

Algorithm

Now one could write down the algortihm.

  • Return 0 if the string is empty or k is equal to zero.
  • Set both set pointers in the beginning of the string left = 0 and right = 0 and init max substring length max_len = 1.
  • While right pointer is less than N:
    • Add the current character s[right] in the hashmap and move right pointer to the right.
    • If hashmap contains k + 1 distinct characters, remove the leftmost character from the hashmap and move the left pointer so that sliding window contains again k distinct characters only.
    • Update max_len.

Implementation

Current
1 / 18

Complexity Analysis

Do we have here the best possible time complexity O(N)\mathcal{O}(N) as it was for more simple problem with at most two distinct characters?

For the best case when input string contains not more than k distinct characters the answer is yes. It's the only one pass along the string with N characters and the time complexity is O(N)\mathcal{O}(N).

For the worst case when the input string contains n distinct characters, the answer is no. In that case at each step one uses O(k)\mathcal{O}(k) time to find a minimum value in the hashmap with k elements and so the overall time complexity is O(Nk)\mathcal{O}(N k).

  • Time complexity : O(N)\mathcal{O}(N) in the best case of k distinct characters in the string and O(Nk)\mathcal{O}(N k) in the worst case of N distinct characters in the string.

  • Space complexity : O(k)\mathcal{O}(k) since additional space is used only for a hashmap with at most k + 1 elements.


Approach 2: Sliding Window + Ordered Dictionary.

How to achieve O(N)\mathcal{O}(N) time complexity

Approach 1 with a standard hashmap couldn't ensure O(N)\mathcal{O}(N) time complexity.

To have O(N)\mathcal{O}(N) algorithm performance, one would need a structure, which provides four operations in O(1)\mathcal{O}(1) time :

  • Insert the key

  • Get the key and check if the key exists

  • Delete the key

  • Return the first or last added key/ value

The first three operations in O(1)\mathcal{O}(1) time are provided by the standard hashmap, and the forth one - by linked list.

There is a structure called ordered dictionary, it combines behind both hashmap and linked list. In Python this structure is called OrderedDict and in Java LinkedHashMap.

Ordered dictionary is quite popular for interviews. for example, check out the Implementing a LRU Cache question by Google.

Algorithm

Let's use ordered dictionary instead of standard hashmap to trim the algorithm from approach 1 :

  • Return 0 if the string is empty or k is equal to zero.
  • Set both pointers to the beginning of the string left = 0 and right = 0 and initialize max substring length max_len = 1.
  • While right pointer is less than N:
    • If the current character s[right] is already in the ordered dictionary hashmap -- delete it, to ensure that the first key in hashmap is the leftmost character.
    • Add the current character s[right] in the ordered dictionary and move right pointer to the right.
    • If ordered dictionary hashmap contains k + 1 distinct characters, remove the leftmost one and move the left pointer so that sliding window contains again k distinct characters only.
    • Update max_len.

Implementation

Complexity Analysis

  • Time complexity : O(N)\mathcal{O}(N) since all operations with ordered dictionary : insert/get/delete/popitem (put/containsKey/remove) are done in a constant time.

  • Space complexity : O(k)\mathcal{O}(k) since additional space is used only for an ordered dictionary with at most k + 1 elements.

Report Article Issue

Comments: 51
playboy9817's avatar
Read More

don't store the index, instead, store the count of each character, once right point sees a new char (count==0), we increase the number of distinct character by 1, left pointer keeps decreasing the count, once count reaches 0, we decrease it. ordered dict is totally unnecessary

from collections import Counter

class Solution(object):
    def lengthOfLongestSubstringKDistinct(self, s, k):
        counter = Counter()
        res, i, n = 0, 0, 0
        for j, c in enumerate(s):
            if counter[c] == 0:  # never seen before
                n += 1
            counter[c] += 1
            while n > k:
                counter[s[i]] -= 1
                if counter[s[i]] == 0:
                    n -= 1
                i += 1
            res = max(res, j - i + 1)
        return res

79
Show 10 replies
Reply
Share
Report
jack103's avatar
Read More

I don't get why solution 1's time complexity is not just O(N). You have 2 pointers. Right pointer takes n steps. Left pointer can take at most n steps because it's always trailing right. You can argue the time complexity is O(2N), which is O(N), but you cannot say it's O(Nk).

40
Show 6 replies
Reply
Share
Report
crisa's avatar
Read More

Couldn't you store the frequency of each character in the map instead of its rightMost position? That way, when the size of the map exceeds k, you can find the character at the left pointer in the map, and decrement its frequency by 1. Then increment the left pointer (contracting the window). Keep doing this until a character's frequency reaches 0, and remove that character from the map. Wouldn't this achieve O(n) complexity?

20
Show 3 replies
Reply
Share
Report
milinthosani's avatar
Read More

I think my solution is O(n) in any worst case. Let me know if you think it is not. Will appreciate constructive criticism.

class Solution(object):
    def lengthOfLongestSubstringKDistinct(self, s, k):
        d = {}
        i = j = max_len = 0
        while j < len(s):
            d[s[j]] = d.get(s[j], 0) + 1
            if len(d) > k:
                d[s[i]] -= 1
                if d[s[i]] == 0:
                    del d[s[i]]
                i += 1
            max_len = max(max_len, j - i + 1)
            j += 1
        return max_len

7
Reply
Share
Report
zan00's avatar
Read More

Great explanation, Thanks.

7
Reply
Share
Report
ts1992's avatar
Read More

For solution 2, if you used access-ordered LInkedHashMap, it could save you some time from reinserting entries every time (line 17 -21). The constructor looks like this:
Map<Character, Integer> map = new LinkedHashMap<>(k + 1, 1, true);

5
Reply
Share
Report
gevorgk's avatar
Read More

Both solutions are overcomplicated. Simple hashtable with counts does the job

class Solution {
public:
    int lengthOfLongestSubstringKDistinct(const string& s, int k) {
        if(k < 1)
            return 0;
        
        unordered_map<char, int> stats;
        int start = 0, end = 0;
        int maxLen = 0;
        
        for(; end < s.size(); ++end) {
            ++stats[s[end]];
            while(stats.size() > k) {
                if(--stats[s[start]] == 0) {
                    stats.erase(s[start]);
                 }

                 ++start;
            }
            
            maxLen = max(maxLen, end-start+1);
        }
                               
        return maxLen;
    }
};

7
Reply
Share
Report
yuhwu's avatar
Read More

You can definitely reach O(N) complexity with sliding window with two pointers, please refer to the following code.

    public int lengthOfLongestSubstringKDistinct(String s, int k) {
        Integer r=0,l=0,res=0,curDistinct=0;
        int[] freq = new int[256];
        while(r<s.length()){
            if(++freq[s.charAt(r++)]==1) curDistinct++;
            while(curDistinct>k) if(--freq[s.charAt(l++)]==0) curDistinct--;
            res = Math.max(r-l, res);
        }
        return res;
    }

11
Show 4 replies
Reply
Share
Report
ellisa_khoja's avatar
Read More

Awesome article. Very intuitive ;)

3
Reply
Share
Report
migeater's avatar
Read More

Why do we need defaultdict? It seems you hasn't use its default value.
And after python3.7

the insertion-order preservation nature of dict objects has been declared to be an official part of the Python language spec.

2
Reply
Share
Report

You don't have any submissions yet.

340/1883
Contribute
|||
Saved
#301 Remove Invalid Parentheses
Hard
#302 Smallest Rectangle Enclosing Black Pixels
Hard
#303 Range Sum Query - Immutable
Easy
#304 Range Sum Query 2D - Immutable
Medium
#305 Number of Islands II
Hard
#306 Additive Number
Medium
#307 Range Sum Query - Mutable
Medium
#308 Range Sum Query 2D - Mutable
Hard
#309 Best Time to Buy and Sell Stock with Cooldown
Medium
#310 Minimum Height Trees
Medium
#311 Sparse Matrix Multiplication
Medium
#312 Burst Balloons
Hard
#313 Super Ugly Number
Medium
#314 Binary Tree Vertical Order Traversal
Medium
#315 Count of Smaller Numbers After Self
Hard
#316 Remove Duplicate Letters
Medium
#317 Shortest Distance from All Buildings
Hard
#318 Maximum Product of Word Lengths
Medium
#319 Bulb Switcher
Medium
#320 Generalized Abbreviation
Medium
#321 Create Maximum Number
Hard
#322 Coin Change
Medium
#323 Number of Connected Components in an Undirected Graph
Medium
#324 Wiggle Sort II
Medium
#325 Maximum Size Subarray Sum Equals k
Medium
#326 Power of Three
Easy
#327 Count of Range Sum
Hard
#328 Odd Even Linked List
Medium
#329 Longest Increasing Path in a Matrix
Hard
#330 Patching Array
Hard
#331 Verify Preorder Serialization of a Binary Tree
Medium
#332 Reconstruct Itinerary
Medium
#333 Largest BST Subtree
Medium
#334 Increasing Triplet Subsequence
Medium
#335 Self Crossing
Hard
#336 Palindrome Pairs
Hard
#337 House Robber III
Medium
#338 Counting Bits
Easy
#339 Nested List Weight Sum
Medium
#340 Longest Substring with At Most K Distinct Characters
Medium
#341 Flatten Nested List Iterator
Medium
#342 Power of Four
Easy
#343 Integer Break
Medium
#344 Reverse String
Easy
#345 Reverse Vowels of a String
Easy
#346 Moving Average from Data Stream
Easy
#347 Top K Frequent Elements
Medium
#348 Design Tic-Tac-Toe
Medium
#349 Intersection of Two Arrays
Easy
#350 Intersection of Two Arrays II
Easy